81 research outputs found

    Model selection via Bayesian information capacity designs for generalised linear models

    Get PDF
    The first investigation is made of designs for screening experiments where the response variable is approximated by a generalised linear model. A Bayesian information capacity criterion is defined for the selection of designs that are robust to the form of the linear predictor. For binomial data and logistic regression, the effectiveness of these designs for screening is assessed through simulation studies using all-subsets regression and model selection via maximum penalised likelihood and a generalised information criterion. For Poisson data and log-linear regression, similar assessments are made using maximum likelihood and the Akaike information criterion for minimally-supported designs that are constructed analytically. The results show that effective screening, that is, high power with moderate type I error rate and false discovery rate, can be achieved through suitable choices for the number of design support points and experiment size. Logistic regression is shown to present a more challenging problem than log-linear regression. Some areas for future work are also indicated

    Gibbs optimal design of experiments

    Full text link
    Bayesian optimal design of experiments is a well-established approach to planning experiments. Briefly, a probability distribution, known as a statistical model, for the responses is assumed which is dependent on a vector of unknown parameters. A utility function is then specified which gives the gain in information for estimating the true value of the parameters using the Bayesian posterior distribution. A Bayesian optimal design is given by maximising the expectation of the utility with respect to the joint distribution given by the statistical model and prior distribution for the true parameter values. The approach takes account of the experimental aim via specification of the utility and of all assumed sources of uncertainty via the expected utility. However, it is predicated on the specification of the statistical model. Recently, a new type of statistical inference, known as Gibbs (or General Bayesian) inference, has been advanced. This is Bayesian-like, in that uncertainty on unknown quantities is represented by a posterior distribution, but does not necessarily rely on specification of a statistical model. Thus the resulting inference should be less sensitive to misspecification of the statistical model. The purpose of this paper is to propose Gibbs optimal design: a framework for optimal design of experiments for Gibbs inference. The concept behind the framework is introduced along with a computational approach to find Gibbs optimal designs in practice. The framework is demonstrated on exemplars including linear models, and experiments with count and time-to-event responses

    An approach for finding fully Bayesian optimal designs using normal-based approximations to loss functions

    Get PDF
    The generation of decision-theoretic Bayesian optimal designs is complicated by the significant computational challenge of minimising an analytically intractable expected loss function over a, potentially, high-dimensional design space. A new general approach for approximately finding Bayesian optimal designs is proposed which uses computationally efficient normal-based approximations to posterior summaries to aid in approximating the expected loss. This new approach is demonstrated on illustrative, yet challenging, examples including hierarchical models for blocked experiments, and experimental aims of parameter estimation and model discrimination. Where possible, the results of the proposed methodology are compared, both in terms of performance and computing time, to results from using computationally more expensive, but potentially more accurate, Monte Carlo approximations. Moreover, the methodology is also applied to problems where the use of Monte Carlo approximations is computationally infeasible

    Modelling Survival Data to Account for Model Uncertainty: A Single Model or Model Averaging?

    Get PDF
    This study considered the problem of predicting survival, based on three alternative models: a single Weibull, a\ud mixture of Weibulls and a cure model. Instead of the common procedure of choosing a single ???best??? model, where\ud ???best??? is defined in terms of goodness of fit to the data, a Bayesian model averaging (BMA) approach was adopted to\ud account for model uncertainty. This was illustrated using a case study in which the aim was the description of\ud lymphoma cancer survival with covariates given by phenotypes and gene expression. The results of this study indicate\ud that if the sample size is sufficiently large, one of the three models emerge as having highest probability given the\ud data, as indicated by the goodness of fit measure; the Bayesian information criterion (BIC). However, when the sample\ud size was reduced, no single model was revealed as ???best???, suggesting that a BMA approach would be appropriate.\ud Although a BMA approach can compromise on goodness of fit to the data (when compared to the true model), it can\ud provide robust predictions and facilitate more detailed investigation of the relationships between gene expression\ud and patient survival

    Robust designs for Poisson regression models

    Get PDF
    We consider the problem of how to construct robust designs for Poisson regression models. An analytical expression is derived for robust designs for first-order Poisson regression models where uncertainty exists in the prior parameter estimates. Given certain constraints in the methodology, it may be necessary to extend the robust designs for implementation in practical experiments. With these extensions, our methodology constructs designs which perform similarly, in terms of estimation, to current techniques, and offers the solution in a more timely manner. We further apply this analytic result to cases where uncertainty exists in the linear predictor. The application of this methodology to practical design problems such as screening experiments is explored. Given the minimal prior knowledge that is usually available when conducting such experiments, it is recommended to derive designs robust across a variety of systems. However, incorporating such uncertainty into the design process can be a computationally intense exercise. Hence, our analytic approach is explored as an alternative

    Bayesian spatio-temporal models for stream networks

    Full text link
    Spatio-temporal models are widely used in many research areas including ecology. The recent proliferation of the use of in-situ sensors in streams and rivers supports space-time water quality modelling and monitoring in near real-time. In this paper, we introduce a new family of dynamic spatio-temporal models, in which spatial dependence is established based on stream distance and temporal autocorrelation is incorporated using vector autoregression approaches. We propose several variations of these novel models using a Bayesian framework. Our results show that our proposed models perform well using spatio-temporal data collected from real stream networks, particularly in terms of out-of-sample RMSPE. This is illustrated considering a case study of water temperature data in the northwestern United States.Comment: 26 pages, 10 fig

    A framework for automated anomaly detection in high frequency water-quality data from in situ sensors

    Full text link
    River water-quality monitoring is increasingly conducted using automated in situ sensors, enabling timelier identification of unexpected values. However, anomalies caused by technical issues confound these data, while the volume and velocity of data prevent manual detection. We present a framework for automated anomaly detection in high-frequency water-quality data from in situ sensors, using turbidity, conductivity and river level data. After identifying end-user needs and defining anomalies, we ranked their importance and selected suitable detection methods. High priority anomalies included sudden isolated spikes and level shifts, most of which were classified correctly by regression-based methods such as autoregressive integrated moving average models. However, using other water-quality variables as covariates reduced performance due to complex relationships among variables. Classification of drift and periods of anomalously low or high variability improved when we applied replaced anomalous measurements with forecasts, but this inflated false positive rates. Feature-based methods also performed well on high priority anomalies, but were also less proficient at detecting lower priority anomalies, resulting in high false negative rates. Unlike regression-based methods, all feature-based methods produced low false positive rates, but did not and require training or optimization. Rule-based methods successfully detected impossible values and missing observations. Thus, we recommend using a combination of methods to improve anomaly detection performance, whilst minimizing false detection rates. Furthermore, our framework emphasizes the importance of communication between end-users and analysts for optimal outcomes with respect to both detection performance and end-user needs. Our framework is applicable to other types of high frequency time-series data and anomaly detection applications

    Are the current gRNA ranking prediction algorithms useful for genome editing in plants?

    Get PDF
    Introducing a new trait into a crop through conventional breeding commonly takes decades, but recently developed genome sequence modification technology has the potential to accelerate this process. One of these new breeding technologies relies on an RNA-directed DNA nuclease (CRISPR/Cas9) to cut the genomic DNA, in vivo, to facilitate the deletion or insertion of sequences. This sequence specific targeting is determined by guide RNAs (gRNAs). However, choosing an optimum gRNA sequence has its challenges. Almost all current gRNA design tools for use in plants are based on data from experiments in animals, although many allow the use of plant genomes to identify potential off-target sites. Here, we examine the predictive uniformity and performance of eight different online gRNA-site tools. Unfortunately, there was little consensus among the rankings by the different algorithms, nor a statistically significant correlation between rankings and in vivo effectiveness. This suggests that important factors affecting gRNA performance and/or target site accessibility, in plants, are yet to be elucidated and incorporated into gRNA-site prediction tools
    • …
    corecore